最近,网络钓鱼诈骗对块线构成了重大威胁。网络钓鱼探测器指导他们在狩猎网络钓鱼地址方面的努力。大多数检测器通过随机行走或构建静态子图提取目标地址的交易行为特征。随机行走方法,遗憾的是,由于采样序列长度有限,通常会错过结构信息,而静态子图方法倾向于忽略在不断变化的交易行为中忽略时间的时间特征。更重要的是,当恶意用户故意隐藏网络钓鱼行为时,它们的性能经历严重退化。为了解决这些挑战,我们提出了一种动态图分类,从事务演变图(TEGS)中了解了一种动态图形分类器。首先,我们将交易系列转换为多个时间片,在不同时段中捕获目标地址的交易行为。然后,我们提供快速非参数的网络钓鱼检测器,以缩小可疑地址的搜索空间。最后,TEGDetector认为空间和时间的演变,朝着不断变化的交易行为的完整表征。此外,TEGDetector利用了自适应的学习时间系数来向不同的关注点关注不同的时期,这提供了几个新颖的洞察力。大型Etereum Transaction DataSet上的广泛实验表明,该方法实现了最先进的检测性能。
translated by 谷歌翻译
随着激光雷达传感器和3D视觉摄像头的扩散,3D点云分析近年来引起了重大关注。经过先驱工作点的成功后,基于深度学习的方法越来越多地应用于各种任务,包括3D点云分段和3D对象分类。在本文中,我们提出了一种新颖的3D点云学习网络,通过选择性地执行具有动态池的邻域特征聚合和注意机制来提出作为动态点特征聚合网络(DPFA-NET)。 DPFA-Net有两个可用于三维云的语义分割和分类的变体。作为DPFA-NET的核心模块,我们提出了一个特征聚合层,其中每个点的动态邻域的特征通过自我注意机制聚合。与其他分割模型相比,来自固定邻域的聚合特征,我们的方法可以在不同层中聚合来自不同邻居的特征,在不同层中为查询点提供更具选择性和更广泛的视图,并更多地关注本地邻域中的相关特征。此外,为了进一步提高所提出的语义分割模型的性能,我们提出了两种新方法,即两级BF-Net和BF-Rengralization来利用背景前台信息。实验结果表明,所提出的DPFA-Net在S3DIS数据集上实现了最先进的整体精度分数,在S3DIS数据集上进行了语义分割,并在不同的语义分割,部分分割和3D对象分类中提供始终如一的令人满意的性能。与其他方法相比,它也在计算上更有效。
translated by 谷歌翻译
在这项工作中,我们介绍了通过从原始用户行动中端到端学习来彻底改变个性化推荐引擎的旅程。我们编码用户对Pinner-hord的长期兴趣,该用户通过新的致密全行动损失嵌入了为长期的未来操作进行优化的用户,并通过直接从实时动作序列中学习来捕获用户的短期意图。我们进行了离线和在线实验,以验证新模型体系结构的性能,并解决了在生产中使用混合CPU/GPU设置为如此复杂的模型服务的挑战。拟议的系统已在Pinterest的生产中部署,并在有机和广告应用程序中提供了可观的在线收益。
translated by 谷歌翻译
检测几乎重复的图像是照片共享Web应用程序的内容生态系统的基础。但是,当涉及包含数十亿张图像的网络尺度图像语料库时,此类任务是具有挑战性的。在本文中,我们提出了一个有效的系统,用于检测80亿张图像中的近重复图像。我们的系统包括三个阶段:候选人生成,候选人选择和聚类。我们还证明,该系统可用于大大提高许多现实应用程序的建议和搜索结果的质量。此外,我们还包括六年来系统的发展,为新系统如何设计以适应有机内容的增长以及最新技术的方式提供体验和课程。最后,我们正在释放本文介绍的约53,000对图像的人体标记的数据集。
translated by 谷歌翻译
Deep learning models can achieve high accuracy when trained on large amounts of labeled data. However, real-world scenarios often involve several challenges: Training data may become available in installments, may originate from multiple different domains, and may not contain labels for training. Certain settings, for instance medical applications, often involve further restrictions that prohibit retention of previously seen data due to privacy regulations. In this work, to address such challenges, we study unsupervised segmentation in continual learning scenarios that involve domain shift. To that end, we introduce GarDA (Generative Appearance Replay for continual Domain Adaptation), a generative-replay based approach that can adapt a segmentation model sequentially to new domains with unlabeled data. In contrast to single-step unsupervised domain adaptation (UDA), continual adaptation to a sequence of domains enables leveraging and consolidation of information from multiple domains. Unlike previous approaches in incremental UDA, our method does not require access to previously seen data, making it applicable in many practical scenarios. We evaluate GarDA on two datasets with different organs and modalities, where it substantially outperforms existing techniques.
translated by 谷歌翻译
The development of social media user stance detection and bot detection methods rely heavily on large-scale and high-quality benchmarks. However, in addition to low annotation quality, existing benchmarks generally have incomplete user relationships, suppressing graph-based account detection research. To address these issues, we propose a Multi-Relational Graph-Based Twitter Account Detection Benchmark (MGTAB), the first standardized graph-based benchmark for account detection. To our knowledge, MGTAB was built based on the largest original data in the field, with over 1.55 million users and 130 million tweets. MGTAB contains 10,199 expert-annotated users and 7 types of relationships, ensuring high-quality annotation and diversified relations. In MGTAB, we extracted the 20 user property features with the greatest information gain and user tweet features as the user features. In addition, we performed a thorough evaluation of MGTAB and other public datasets. Our experiments found that graph-based approaches are generally more effective than feature-based approaches and perform better when introducing multiple relations. By analyzing experiment results, we identify effective approaches for account detection and provide potential future research directions in this field. Our benchmark and standardized evaluation procedures are freely available at: https://github.com/GraphDetec/MGTAB.
translated by 谷歌翻译
As one of the prevalent methods to achieve automation systems, Imitation Learning (IL) presents a promising performance in a wide range of domains. However, despite the considerable improvement in policy performance, the corresponding research on the explainability of IL models is still limited. Inspired by the recent approaches in explainable artificial intelligence methods, we proposed a model-agnostic explaining framework for IL models called R2RISE. R2RISE aims to explain the overall policy performance with respect to the frames in demonstrations. It iteratively retrains the black-box IL model from the randomized masked demonstrations and uses the conventional evaluation outcome environment returns as the coefficient to build an importance map. We also conducted experiments to investigate three major questions concerning frames' importance equality, the effectiveness of the importance map, and connections between importance maps from different IL models. The result shows that R2RISE successfully distinguishes important frames from the demonstrations.
translated by 谷歌翻译
Compressed videos often exhibit visually annoying artifacts, known as Perceivable Encoding Artifacts (PEAs), which dramatically degrade video visual quality. Subjective and objective measures capable of identifying and quantifying various types of PEAs are critical in improving visual quality. In this paper, we investigate the influence of four spatial PEAs (i.e. blurring, blocking, bleeding, and ringing) and two temporal PEAs (i.e. flickering and floating) on video quality. For spatial artifacts, we propose a visual saliency model with a low computational cost and higher consistency with human visual perception. In terms of temporal artifacts, self-attention based TimeSFormer is improved to detect temporal artifacts. Based on the six types of PEAs, a quality metric called Saliency-Aware Spatio-Temporal Artifacts Measurement (SSTAM) is proposed. Experimental results demonstrate that the proposed method outperforms state-of-the-art metrics. We believe that SSTAM will be beneficial for optimizing video coding techniques.
translated by 谷歌翻译
We propose a distributionally robust return-risk model for Markov decision processes (MDPs) under risk and reward ambiguity. The proposed model optimizes the weighted average of mean and percentile performances, and it covers the distributionally robust MDPs and the distributionally robust chance-constrained MDPs (both under reward ambiguity) as special cases. By considering that the unknown reward distribution lies in a Wasserstein ambiguity set, we derive the tractable reformulation for our model. In particular, we show that that the return-risk model can also account for risk from uncertain transition kernel when one only seeks deterministic policies, and that a distributionally robust MDP under the percentile criterion can be reformulated as its nominal counterpart at an adjusted risk level. A scalable first-order algorithm is designed to solve large-scale problems, and we demonstrate the advantages of our proposed model and algorithm through numerical experiments.
translated by 谷歌翻译
Witnessing the impressive achievements of pre-training techniques on large-scale data in the field of computer vision and natural language processing, we wonder whether this idea could be adapted in a grab-and-go spirit, and mitigate the sample inefficiency problem for visuomotor driving. Given the highly dynamic and variant nature of the input, the visuomotor driving task inherently lacks view and translation invariance, and the visual input contains massive irrelevant information for decision making, resulting in predominant pre-training approaches from general vision less suitable for the autonomous driving task. To this end, we propose PPGeo (Policy Pre-training via Geometric modeling), an intuitive and straightforward fully self-supervised framework curated for the policy pretraining in visuomotor driving. We aim at learning policy representations as a powerful abstraction by modeling 3D geometric scenes on large-scale unlabeled and uncalibrated YouTube driving videos. The proposed PPGeo is performed in two stages to support effective self-supervised training. In the first stage, the geometric modeling framework generates pose and depth predictions simultaneously, with two consecutive frames as input. In the second stage, the visual encoder learns driving policy representation by predicting the future ego-motion and optimizing with the photometric error based on current visual observation only. As such, the pre-trained visual encoder is equipped with rich driving policy related representations and thereby competent for multiple visuomotor driving tasks. Extensive experiments covering a wide span of challenging scenarios have demonstrated the superiority of our proposed approach, where improvements range from 2% to even over 100% with very limited data. Code and models will be available at https://github.com/OpenDriveLab/PPGeo.
translated by 谷歌翻译